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We put forth the idea that Hamilton's equations coincide with deterministic and reversible evo- 
lution. We explore the idea from four different perspectives (mathematics, thermodynamics, infor- 
mation theory and state mapping) and we show how they in the end coincide. We concentrate on 
a single degree of freedom at first, then generalize. We also discuss possible philosophical reasons 
why the laws of physics can only describe such processes, even if others must exist. 
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I. INTRODUCTION 

Understanding is being able to look at something from 
different perspectives, and realizing it's the same thing. 
At least to me anyway. There are parts of physics that 
overlap with philosophy and math, and I find that a bet- 
ter understanding of the connection between them pro- 
vides an insight that none of them can provide alone. 

In this work we will look into fundamental questions 
such as: why does classical mechanics have this particular 
mathematical form? Could we have something different, 
or is it necessary? If so, what are the assumptions, either 
physical or philosophical? And why are they encoded in 
math in the way they are? What limitations do those 
assumptions bring? 

The answer that I believe I have found is that classical 
mechanics, in the Hamiltonian formulation, is necessary 
if we are describing a deterministic and reversible process 
for an infinitesimally reducible homogeneous body, one 
that you can think of being made of an infinite amount 
of indistinguishable parts. ^ That is: classical mechanics 
can be derived from those assumptions. 

Instead of taking a direct abstract approach, I'll first 
"reverse engineer" from the equations, adding meaning 
to each piece. It's less tedious and more rewarding as you 
will probably have familiarity with many of the pieces, 
and we can concentrate on the connections. I think it 
will also help to clarify exactly what I mean for each 
individual term. We will then restart "from scratch", 
defining the state space from first principles, and arrive 
to the same conclusions. 



II. THE MATHEMATICS 

First we should review the math. I'll frame it from 
a somewhat unusual angle, which can be later mapped 
to a direct physical meaning. Hamilton's equations are 
usually written as: 



dx dH 
dt dp 
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which I find confusing: on the left side you have functions 
of t while on the right side you have functions of x and p. 
You cannot substitute x(t) and p(t) in H, or you won't 
be able to take the derivatives. What we really mean by 
those equations is: 
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That is, we take the derivative and then calculate it 
at the given position and momentum. This equation is 
really telling us two things, and I'd like to break them 
up to make that clear. The first part is: 

'^^=S^{x{t),p{t)) 
^ ^ SP{xit),pit)) 

where P is our point in phase space. This tells us that the 
evolution is continuous in t, and that it's a function of the 
current state of the system. That is, given the state, we 
are always going to move in a particular direction. S is 
the vector field that tells us where each point is moving, 
and the field lines are the trajectories they are going to 
follow in phase (state) space. 
The second part is: 
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^ Each of these terms has multiple meanings in different disciplines, 
so I'll try my best to clarify the flavor that is needed here. 
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This tells us that our S admits a potential H, and 
the relationship is the gradient rotated 90 degrees. This 
means S is pointing in the direction of constant H, that 
the field lines of S are the lines of equal H. We also have 
the following relationships: 

- a d 

div(S) = —5^ + —SP 
ox op 
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III. THERMODYNAMICS 

First of all, let's assume we have a system undergoing 
a deterministic and reversible change. The final state is 
known given the initial state, but it also means the trans- 
formation is continuous: to be reversible in the thermo- 
dynamic sense we will assume the process is quasi-static 
so the change has to be small. This gives us: 

dP 

P{t + dt) = P{t) + —dt 

P{t + dt) = P{t) + S{P{t))dt 



First of all, the divergence of S is the curl of the gradient 
of H. This is possible because of the 90 degree rotation 
and because in two dimensions the curl is a scalar, like 
the divergence. Second, the field S is solenoidal, which 
means: 

• the net flux of S within a region is zero: the area 
of phase space that enters is equal to the area that 
exits 

• any region transported by S will preserve its area 

These are not just consequences: these are necessary and 
sufficient conditions. They are equivalent to requiring 
that the divergence is zero. 

Once these conditions are required, we can construct 
the potential H. Suppose, in fact, that you do have a 
vector field for which the flux flowing into any region is 
zero. Consider two points on the boundary of a region: 
the flux from one side equals the flux on the other side. 
Now change just one of the half boundaries: the new 
region still has zero flux, so the flux from the new half 
boundary is equal to the flux of the previous one, and to 
any other we would construct. In other words: the flux 
across a line only depends on the two extremes. You can 
now deflnc a potential H by flrst arbitrarily choosing a 
point for which H is zero, and letting H at any other 
point be the flux across a line between the reference and 
this point. You can then show that this is indeed the 
potential H we started from^. This is just a sketch of 
a demonstration, but should give you the idea. This is 
very similar to constructing a scalar potential for an ir- 
rotational field. [3] 

To sum up, Hamiltonian mechanics is telling us two 
things: the change of state is continuous and a function 
of the state; the flux of the change is or, equivalently, 
the area of phase space is conserved. What do these 
statements mean physically? 



which is the first equation. For the second equation: if 
the system is truly reversible, it means that no energy is 
lost to heat; if the system is truly deterministic, meaning 
that the future evolution only depends on its state, it 
must be isolated, or the state would have to include the 
one of the external system acting on the first. ^ Therefore, 
we need to assume our system does not acquire or lose 
energy to the outside, or to other degrees of freedom: it 
needs to conserve energy. If we were to apply a force 
on the system, we would not be able to extract or inject 
energy while returning it to its initial state. In other 
words: the energy absorbed by the system on a closed 
trajectory in phase space has to be zero. This means: 

AE = (h dE = 



[CL^-^kinetic i 0'i^work) 

{vdp ~ Edx) 
dx dp 

{S'^dp - S^dx) 



The energy absorbed goes either into kinetic energy vdp 
or into work performed against the system —Edx. We 
recognize v and F as S^ and S^. As you can see, the 
absorbed energy along a path is the flux entering through 



■^ In two dimensions this potential is sometimes referred to as the 
stream function. [l| With multiple degrees of freedom, the Hamil- 
tonian can't simply be the stream function or the vector potential 
of the flux. 



^ Let's clarify this with an example. Consider a car going at a 
constant speed. At first glance, you may consider this system 
deterministic because you can seem to predict the position and 
the (constant) momentum. The truth is that the system is under 
two forces: the friction of the air and the push of the engine. At 
some point, the car is going to consume its gas and the speed will 
start decreasing. We then realize that our initial assumption that 
the system was determined only by its position and momentum 
was wrong: we needed at least to take into account the level of 
gas in the tank to predict when the car would slow down. If 
we add that to our state, we notice we do not have a reversible 
system, as the energy from the tank is being lost to friction. 



it; thus energy conservation leads to zero flux over any 
region of phase space. These are the same condition. 

Note that if the system absorbed energy along a closed 
trajectory, the integral would be positive and we would 
have flow going inside of the enclosed region; since we 
are spending energy to maintain the closed trajectory, we 
can assume that energy is being dissipated in some other 
degree of freedom, so we will call this case non-reversible. 
If the system released energy along a closed trajectory, 
the integral would be negative, we would have flow going 
outside of the region; since we are extracting energy from 
the closed trajectory, we have to assume that energy is 
being added from some other degree of freedom, external 
to the system, so we will call this case non-deterministic 
because the system is not isolated and its behavior is 
dependent on an external system and its state. Non- 
deterministic in this sense does not imply the behavior 
of the system is random, but that it is determined by 
some other system or degree of freedom. You may find 
these definitions odd, mainly because they use standard 
language in a slightly non-standard way. But you will 
see that this particular meaning will hold for the other 
perspectives. So, in this sense, they are more useful and 
more " true" . 

To recap, the flow within a region of phase space is 
actually the energy absorbed by the system along its 
boundary. We called non-reversible the case of positive 
energy absorbed, and negative flux (the flux flows out of 
the region) ; we called non-deterministic the case of neg- 
ative energy absorbed, and positive flux. While we know 
what the flux corresponds to physically, we don't have a 
good idea of what the area in phase space is. 



IV. INFORMATION 

Let's assume we have a distribution p{x,p) instead of 
a point particle. This, actually, makes a lot more phys- 
ical sense because macroscopic systems are distributed 
in space, and a point particle is simply a limit where 
that distribution is concentrated. Let's suppose, again, 
we have a deterministic and reversible evolution, where 
each of the elements of the distribution can be followed 
independently. That is, we assume the system is inflnites- 
imally reducible: we can think it as made of an inflnite 
amount of tiny elements, each with its own state and evo- 
lution. This implies a map between the initial points and 
flnal points in phase space, so that we can keep track of 
the evolution of each tiny element. If we also assume the 
evolution to be continuous in time, we have, like before: 



dP 

Pit + dt) = Pit) + -—dt 
dt 

P{t + dt) = P{t) + S{P{t))dt 



To identify an element in the distribution, and track 
that speciflcally, we need to identify it from the others: 



we need some extra information. But if the evolution is 
truly deterministic and reversible, the extra information 
we need to identify an element must be always the same 
no matter at what time: once identiflcd at any t it is 
identifled for all t. 

That extra information is what is deflned as informa- 
tion entropy. For a distribution over a discrete set it is 
given by: 



^ p{i)log{p{i)) 



where the log is usually taken in base 2. This repre- 
sent the number of average bits of information required 
to identify one element within the discrete distribution. 
But, in our case, we have a continuous distribution in 
X and p: clearly the information needed to identify an 
element is inflnite; but it's also clear that it will be more 
difficult to identify an clement from a continuous distri- 
bution from to 2 meters than an element from to 1 
meter. Yes, mathematically they both require infinite in- 
formation, but there is a finite difference: the first one 
is twice as hard. In the continuous case, we define the 
continuous entropy: 



/ 



p{x,p)log{p{x,p))dxdp 



which you can think of as the information needed to iden- 
tify an element in the distribution minus the entropy for 
a uniform distribution from to 1. That is: it's a relative 
number, it can be negative, and tells you the number of 
bits compared to that. So, for example, a distribution 
from to 2 is twice as spread, so the continuous entropy 
will be 1, because you need an extra bit of information. 
This quick explanation is not meant to derive these con- 
cepts, but just to give you an intuitive understanding so 
that we can link it with the rest of the discussion. 

So, going back to deterministic and reversible pro- 
cesses: what we need is for this quantity, the continuous 
entropy, to be conserved. Under bijectivc transformation, 
the entropy transforms as: 

I' = I + / p{x , p)log{J)dxdp 
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where J is the Jacobian of the transformation, which we 
calculate by taking the coordinates of the transformed 
point Pit + dt) and deriving them. For the entropy to be 
conserved no matter the initial distribution, we need to 



require that \J\ is one. If we calculate, and disregard the 
terms in dt^ or higher, we have: 



d d 

1 = 1 + —S'^dt + —SPdt 
ox op 

d d 

0= S''dt+—S'Pdt 

ox op 



which brings us, again, to the same condition: the area 
in phase space is conserved and the flux is zero. But why 
exactly? The Jacobian represents how much the space is 
stretched at each point. Requiring that the Jacobian is 
unitary everywhere means requiring that the space is not 
stretched overall: if you stretch it locally in one direction 
than you must do the opposite on the other. No won- 
der that the area is conserved. But why do we need to 
conserve it to conserve the entropy? Each small element 
carries its own entropy contribution, and all of them are 
summed through the integral. If phase space is stretched, 
the distribution in that neighborhood is stretched too, so 
the value of the density p at that point will decrease (or 
increase if compressed) . Which will change the contribu- 
tion to the total entropy for that clement. 

Intuitive arguments are easier to understand: if your 
distribution stretches, you will have to identify elements 
from a bigger range. If we imagine a continuous distri- 
bution over a region, the entropy increases with the area 
covered. In fact, log{A) will give you the entropy: one 
unit of area gives you (because it's the reference), two 
units gives you 1 (one bit more than the reference) and 
so on. So, conserving the area means conserving the in- 
formational entropy. 

Now, if the entropy decreases, it's impossible to tell 
past states as we have less and less information about the 
system: this is the non-reversible case, the area shrinks, 
the flux is outward and, as we saw before, energy is lost 
by the system. If the entropy increases, it's impossible to 
tell future states because we will need more and more in- 
formation about the system: this is the non-deterministic 
case, the area grows, the flux is inward and energy is 
gained by the system.** 

We have a direct understanding of the flux (the en- 
ergy) and of the area (the entropy) , but there is still one 
thing missing. When we define our bijective map in the 
first equation, we do not get the second equation for free. 
Both parts arc driven by assuming deterministic and re- 
versible processes but we seem to have to " double-dip" on 
the same assumption. Why is that? Can it be avoided? 



** I realize the explanations are very succinct: I am cramming many 
concepts into a few paragraphs. While they may take time to 
sink in, once they do you realize that the connection is that 
straightforward. And it's fascinating. 



V. STATE SPACE DEFINITION 

To answer those questions we are going to start from 
scratch and define states from the ground up. This has 
the extra benefit of clearly spelling out our assumptions 
and answers other questions, such as: why are states 
in classical mechanics vectors in a real space? Why are 
point particles so important to describe macroscopic ob- 
jects (which are definitely not point particles)? This 
treatment is necessarily a bit abstract, and it will take 
considerations from math, physics and a bit of philos- 
ophy. I am not particularly an expert in all of them: 
the aim is simply to have enough ground work for the 
mathematician, the physicist and the philosopher to see 
where this is going. I believe that, to properly treat all 
the sides, one would end up with a much much longer 
work which nobody would be able to read. Details can 
kill you. 

So, let's start from scratch: we have a system. This 
system can be in different configurations, which we call 
states. To each state we assign a label that identifies the 
state. A label can typically be related to: 

• the way it was prepared - it's the state I prepared 
with the following settings 

• an ideal measurement at that time - it's the state 
that has this momentum and this position 

• the future evolution - it's the state that will bend 
right if I put it in a magnetic field 

Now, whenever you talk about measurement in physics, 
especially in quantum, people drag in all sorts of ques- 
tions, both philosophical and methodological. I just want 
to be clear that, for this treatment, we are not interested 
in those. The only things we care about are the labels. 
How you get to those labels, or why can we have those 
labels in the first place are not concerns of ours at this 
time. And just to be clear: this is not to trivialize that 
work. Quite the contrary. 

I believe that finding the right labels is the actual prob- 
lem in science! Progress in physics resulted by correctly 
identifying new labels: energy, entropy, momentum for 
photons, lepton number and so on. The point is that, 
once that work is done, once we do have recipes for 
preparing states, once we do have a way to recognize 
them, we can create our abstract mathematical set of 
states, a label for each. And once we have that set, it does 
not matter what it actually represents: just its mathe- 
matical properties. Two different systems that map to 
the same mathematical structure will have similar prop- 
erties. That's the beauty of math. 

So we have in principle defined our set of states. But 
we can't do much with it. So we'll make the follow- 
ing assumption: the system is infinitesimally reducible. 
That is, the state of the whole system is the state of its 
parts, and the state of each part is in turn equal to the 
state of its parts, and so on ad infinitum. This assump- 
tion, which I'll call the classical assumption, is similar 



to the old Greek philosophical idea of infinite divisibility, 
to which Democritus' atomic idea is opposed. But there 
is a subtle and important difference: nothing here is di- 
vided physically, it's the state that, mathematically, can 
be seen as composed of two parts. 

To clarify what I mean by reducibility in this context, 
and how is it different from divisibility, I'll make a few 
examples. Consider a living cell: there exists a process 
(mitosis) that starts with one cells and divides it into 
two. But the state of a cell is not equal to the state of 
two cells. A cell is divisible into two cells but it's not 
reducible to two cells. Consider a magnet: we can think 
of it as made of a north pole and a south pole, so the 
state of the magnet can be described by the state of its 
poles. But if we divide it, we do not get a north pole and 
a south pole. A magnet is reducible to two poles but it's 
not divisible into two poles. 

Consider a photon: it can decay into an electron- 
positron pair. But you can't think of a photon as made 
of an electron and a positron. A photon is divisible into 
a pair but it's not reducible to a pair. Consider a proton: 
as far as we know it's made of quarks and gluons. But 
if we try to divide it, we don't get isolated quarks and 
gluons. The proton is reducible to quark and gluons but 
it's not divisible into them. 

If the state is infinitesimally reducible, we can just 
consider all the possible states of the infinitesimal con- 
stituents. The state of our system will then be fully de- 
scribed by a distribution of its constituents among those. 
For example, if the parts can only be in two states, then 
the state of our system is identified by how much is in 
the first state and how much is in the second. If the parts 
are fully identified by position and momentum, then we 
need to keep track what fraction of the state has what 
position and momentum, that is p{x,p). 

In mathematical terms, the classical hypothesis of in- 
finitesimal reducibility implies that the state space is a 
linear vector space. The basis of that space represents 
the possible states of our infinitesimal elements. Why is 
this so important? Because we are enormously reducing 
the problem! To describe the evolution we do not have to 
handle each state individually, but we can limit ourselves 
to describing the evolution of each infinitesimal element 
of the base of our space. Naturally we can't always make 
that assumption, but when we can it's a really powerful 
assumption! 

This tells us why point particles are so special: they 
are the basis, the states for our infinitesimal element. 
But what's important here is that we have defined them 
with a recipe of subdivision. That is: they don't exist 
by themselves. What exists are the full states, the in- 
finite sum over infinite components. We are not taking 
point particles and grouping them together into finite 
systems, we are taking finite systems and breaking them 
up into particles. They should really be called "point- 
like pieces". Later we'll see why the distinction is not 
just academic. 

Ok, we have our state space, which is actually a linear 



vector space. Now we want to study the properties of 
a deterministic and reversible evolution. Turns out that 
infinite dimensional spaces hide things, so we'll start dis- 
crete. Let's consider a finite dimensional space of rank 
iV, which has N basis elements. As we said before, we 
can simply concentrate on the evolution of the bases, so 
what we need is simply a map Sf = m{si) that tells us 
the final state s/ given the initial state Si . If the evolu- 
tion is deterministic and reversible, the map is bijective, 
an invertible one-to-one correspondence. 

Consider now a subset of M basis elements and its 
evolution. We will have the following: 

• the net flux of states across the set is zero: the 
number of states that enter the set is equal to the 
number of states that exit 

• the transformation preserves the number of states 
in the set 

For the first, the set remains the same and elements go 
in and out. And because of the one to one mapping, each 
element that comes in must push another out. For the 
second, the set changes, but because of the one to one 
mapping the number of states is the same. Note that 
these statements are identical to the ones about area in 
the math section. Intuitively, we have already found the 
result: in the limit the area will be the measure of the 
number of states. Mathematically, we need to make the 
limit right. And I only have a slightly convoluted way to 
do that.^ 

The way I understand it, the trick is that we need 
to restrict the argument to depend only on elements of 
a finite region, so that it does not matter whether the 
space of our parameters is open (i.e. infinite). So, let me 
first rephrase the argument in the discrete case. Let's 
still consider a set of M basis. If we apply the map, the 
states will "move" to other states. Some of these states 
will go outside the subset and some will stay inside. Since 
each state will have to go somewhere, we can say that 
M — Ngi + Ngo'- the number of states that go inside 
{Ngi) plus the number of states that go outside (iVgo) is 
equal to the states we started with. We can also apply 
the map in reverse, and see where they are coming from. 
Some are coming from outside the subset and some come 
from inside. We can write M = Nd + Nco- the number 
of states that come from inside (Nd) plus the number 
of states that come from outside {Nco) is equal to the 
number of states in the subset. 

Now, consider a mapping between elements that stay 
inside. That is, that maps two states that come from 
inside and go inside. Since the map is bijective, we 
can make a one-to-one correspondence between the two. 
That is the number of elements that come from inside 
must be equal to the number of elements that go inside: 



Many ideas and elements of this proof where taken from Q. 
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N„ 



If that's the case, it's also true that that two different sums: 



Nco — Ngo'. the number of elements that enter the set is 
equal to the number of elements that leave the set. In 
other words, we have divided the set of M states into 
two subsets in two different ways, but the sizes of those 
subsets are the same. 

These two equalities represent the properties as be- 
fore, but now the mapping does not go out of the set of 
M states: the ones that stay inside are mapped to each 
other, the ones that come from outside to the ones that 
go outside (don't care how, but they are in the same num- 
ber). This way we have a chance to keep the mapping 
when we take the limit. 

If we let just TV go to infinity, the discussion does not 
change: M remains finite and so do all the other num- 
bers. For the continuous limit, we need to do more work. 
Let's start by transporting our discrete basis to the {x,p) 
plane, that is each state is labeled with an x and a p: it 
can be identified by a point in phase space. 

Consider a region R of our parameters: it will contain 
a number of elements M whose labels fall within the area. 
Let's call each ^k, ke{l,2, ...,M}. We can partition that 
area so that each state is in one cell, each cell has one 
state and all the area is covered. We could imagine the 
area of the cell to signify the uncertainty on our label, but 
it's not necessary and anyway it will not matter once the 
limit is taken. We chose the regions to be not-overlapping 
to signify that they form a basis: none of them could be 
reduced into the sum of each other. Note that we are still 
in the previous case: under a bijective map the number 
of elements do not change. If we assumed each cell to be 
of the same size, this would would mean that the area 
would not change either. But we needn't assume that: 
let the area of each cell V^ be different. And we have: 



M 



fc=i 



V 



where V is the total area. We can define a discrete den- 
sity function -D'(^fc) 



D'iCk) 



1 



so that: 



M 



J2D'i^k)Vk^l 



fc=i 



This sum is finite, no matter the size of M, and it's a 
sum of a function multiplied by a volume. As before, 
we require a bijective map to be defined on the elements, 
and, as before, we know that the number of elements that 
start in the set is the same as the number of elements 
that end in the set. Summing over those two gives us 



fe = l 

k=l 

Fgi represents the fraction of the region R covered by the 
states that remain inside the region, while Fd represents 
the fraction covered by states that are coming from the 
region. If we divided into cells of the same size, these 
numbers would already be equal, but at this point they 
arc not. What is always equal are the number of points 
Nci and Ng^. 

It's time to take the limit. Let's increase M, so the 
packing gets tighter, each area Vfc gets smaller and the 
distance between all points decreases. We do it so that 
all products MV^ remain finite. D'{^k) approaches the 
normalized density distribution function -D(^), which is 
assumed to be continuous and finite. The first sum be- 
comes the integral: 



D{^)v = 1 



where v ~ dxdp. The other two sums become: 



D{i)dxdp = F„ 






D{^)dxdp = F, 



R,o 



ff* 



These integral are the same: the more we pack the re- 
gion, the more the difference between the areas of the 
cells becomes negligible. In the limit, each cell will cover 
the same area, but since the number of cells is the same 
because of the bijective mapping, the result is the same. 
Which means: the fraction covered by the region is the 
same, therefore the area is the same. This also means 
that the area going through the boundary is zero, and if 
we have an infinitesimal transformation, the fiux corre- 
sponds to the change in area, so it will be zero. You can 
now deduce that all areas are conserved too: start with a 
region, transform it into the end region and consider the 
union of the two; the regions that go inside and comes 
from inside coincide with the start and end regions, which 
means they will have the same area. Phew. 

As it is sometimes the case, something that is evident 
intuitively becomes convoluted to demonstrate mathe- 
matically. Naturally there may be a better proof, though 
I would not be surprised if there weren't. The important 
bit here to understand is that a state is a region of phase 
space, not a point. When we take the limit, we are as- 
suming we are getting better and better at labelling our 
state: that we can ideally break up our finite system in 
smaller and smaller pieces, with more refined labels. The 
region covered by each state becomes infinitesimal, but 
not zero. This means that for each region we will have 
an infinite number of states, but a region double the size 
will still contain double the number of states. In other 



words: yes they are both infinite, but the ratio between 
them is finite and well defined. This is the same effect 
we saw when talking about informational entropy: you 
still need an infinite amount of information to identify 
each element, but double the size of the distribution and 
you double the amount of information. It is the same 
effect precisely because the cause is the same: you are 
doubling the number of cases, you are doubling the size 
of the basis. 

The other important thing is that we are defining the 
map on the discrete case and taking the limit while main- 
taining the map. Physically, this means that our deter- 
ministic and reversible process has to be defined at the 
same resolution. That is: you have to be able to deter- 
mine the future behavior of the system with the same 
uncertainty. This is obvious once you think about it, but 
it's precisely what you miss if you don't have the area 
conservation. The last interesting bit is that the area 
conservation is guaranteed only in the limit. In fact, in 
the discrete case you could either have each state cover- 
ing a different amount of phase space, or you could even 
leave that undefined. In other words: the area conserva- 
tion is a consequence of the continuity of the parameters 
X and p. 

This pretty much concludes the main discussion: we 
have been able to derive the Hamilton equations for 
one degree of freedom starting from determinism and re- 
versibility. This also allowed for a more in depth look at 
those concepts, and a better understanding of the connec- 
tion between seemingly disparate things. There are still 
details I believe could be improved (why is energy con- 
nected to the state flow? why does the evolution need to 
be continuous? why is a spatial degree of freedom covered 
by just position and momentum?), but the foundations 
are there and I believe are solid. 



VI. GENERALIZATION TO MULTIPLE 
DEGREES OF FREEDOM 



Generalizing to three degrees of freedom means work- 
ing with a six-dimensional space. This will make it im- 
possible to visualize what happens. Still, with the knowl- 
edge we gained before, there is plenty we can understand. 

Notation. First of all, let's introduce some useful 
notation.^ Xi will represent the different directions in 
space, and pi the different components of momentum.^ 



The mathematician will forgive me if I use some Ricinaimian 
geometry notation for something that is really symplectic geom- 
etry. I find that this helps create an intuitive understanding of 
what's going on geometrically, and will later allow making an 
interesting parallel. 

As a reminder, space and momentum are parameters which we 
use to divide our states, and phase space is really the space of 
the basis for the division. Studying the evolution of the basis 
means studying the evolution for all states. 



Greek letters will represent all directions in space and 
momentum. We will use the standard tensor notation, 
and as an analog to the covariant component we define: 



t^x 



5^' 



We'll see later why this is useful and may even make 
sense. 

Thermodynamics. This approach works as before. 
Energy absorbed along all paths needs to be zero, except 
now paths are in six dimensions. We have: 



AE = (fiS'^'dp, - SP'dxi) 
= j>{Sx,dxi + Sp^dpi) 
— (p Sada = 



the new notation is indeed useful. Stokes theorem tell us 
this is equivalent to setting the curl of 5'^ to zero: 

daSfj - dpSa = 



and allows a potential H: 



which wc can rewrite as: 
dtx' = 8"=^ 



dtp' = SP 



Spi — dp-H 



the Hamilton equations.^ This is a neat compact form, 
but we skipped a few questions. What are the 5"* com- 
ponents actually doing? And why are the 3°" related to 
the Sa in that way? 

State space. For this approach, we will assume we 
have three independent degrees of freedom (d.o.f.). The 
relationships between XjS and p^s will have to be the same 
as in the single d.o.f.: if they weren't, one degree of free- 
dom could " tell" there were others, and they wouldn't be 
independent. But what is the relationship across them? 
We may be tempted to just assume they are orthogonal, 
but orthogonality in physical space may or may not have 
anything to do with the orthogonality in phase space. 



* Note that force acting on the system is conservative. Energy 
conserved along all paths means also energy conserved in spatial 
paths only, which is the requirement for a conservative force. 



CJo 



To understand this better, consider a small rectangle 
in phase space and its projection onto a d.o.f. If the pro- 
jected area equals the area of the rectangle, given that 
the area on the d.o.f. is a measure of number of states, 
we can make a one to one correspondence between states 
on our d.o.f. and states in the rectangle. They are de- 
pendent to the point that they are the same degree of 
freedom. If the projected area is zero, there is no rela- 
tionship between them. Identifying one point within the 
rectangle will tell us nothing more about our d.o.f. In 
other words, they are independent. In phase space or- 
thogonality means independence. So, yes Xi and Xj are 
orthogonal; not because of the spatial relationship, but 
because they represent independent d.o.f. 

We can now define an inner product: given two vectors 
V and W we sum the vector products of the component 
in each degree of freedom. This gives us the sum of the 
projected areas made by the rectangle that has those 
vectors as sides. The metric is a measure of states, the 
generalization of the area for the single d.o.f. 
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What must happen under evolution? Each degree of 
freedom will conserve its area, as we have seen. Orthog- 
onal directions remain orthogonal because independent 
degrees of freedom will need to remain independent. In 
other words: the inner product that we defined needs to 
be conserved. Under an infinitesimal transformation, we 
have: 

+ V''ujc,^pdsS^^W^)dt + 0{dt^) 
Consistently with what we did before, we define: 

and since co is antisymmetric, we have: 

Sa = —S LOaji 

So we see that, by analogy, we can think of them as 
covariant components. We substitute, simplify and dis- 
regard the higher order in dt: 

V^Wi^d^Sfi - VW^dsSc, = 



Since this equation must hold for all vectors, it must hold 
for any pair of components independently: 

daSfS — dpSa — 

which is the same condition we had with the thermody- 
namic approach. 

Ok, we found this nice mathematical result by defining 
some kind of metric. But how do things move? As we 
said, the transformation has to preserve two things: ar- 
eas on all d.o.f., and orthogonality between independent 
ones. The flux itself may be hard to visualize. But we 
can imagine a small hypercube of phase space. This will 
have faces along independent degrees of freedom, and 
faces perpendicular to the independent degrees of free- 
dom. The first set can change into any parallelogram 
provided it preserves the area; the second set must pre- 
serve orthogonality so it can change into any rectangle of 
any size. The hypercube can rotate, so that it may mix 
d.o.f.. The volume can't change: it's given by the prod- 
uct of the areas on the independent degrees of freedom, 
and since those remain the same, the volume remains the 
same. 

Last note: in special relativity, we define the metric 
Vajj for an inertial observer; we find the transforma- 
tions that preserve that metric; we obtain the Poincare 
group. That is the most general set of transformations 
that gives us inertial observers. Here we set a metric 
uja,i3 that allows us to count states on independent de- 
grees of freedom; deterministic and reversible processes 
must conserve that metric so we find the most general 
group with that property; we obtain Hamilton's equa- 
tions. The two metrics are defined on different spaces, 
have different physical meaning, but the mathematical 
process is the same. 



VII. ON THE UNTENABILITY OF CLASSICAL 
MECHANICS 

What I want to discuss here is a somewhat philosoph- 
ical aspect. With the advent of quantum mechanics, we 
all know that classical mechanics is experimentally not 
tenable: it works for macroscopic systems but not for 
microscopic. So, in that sense, it's not "correct". And 
we put that in quotes because one may argue that no the- 
ory is "correct". At any rate: we know that it docs not 
match experimental data, but could it have been "cor- 
rect" ? Or is there some fundamental problem with it? 

My feeling is that classical mechanics is untenable from 
a logical perspective. 

We saw how Hamiltonian mechanics applies to systems 
that are isolated. Not only are they isolated, but they 
can be reduced into infinitesimal parts, so that each ele- 
ment is isolated. There is no external exchange of energy. 
There is no external exchange of entropy. But if that's 
the case, we can't really tell that the system exists in the 



first place: we can't interact with it, we can't detect it. 
It's existence is a philosophical question, not a scientific 
one. To be able to say something exists, it needs to be 
interacting with something else in the universe. 

Can't this be fixed by adding an external force? In 
fact, isn't gravity always interacting with everything? 
The problem is that all these forces are "optional". Each 
force can be reduced to infinitesimally small contribu- 
tions, each independent from the other. The evolution 
can be thought as free motion plus the effect of all forces. 
So, even in this case, classical mechanics is still allowing 
for completely isolated systems, not ruling them out. 

But in practice we can use it! So it must be good for 
something, right? If we can disregard that outside inter- 
action, if the system is isolated " enough" , if the external 
forces cancel each other out, than the effect is negligible 
compared to other effects and our predictions work. 

Ok, there seems to be some inconsistency if we are 
interested in deterministic and reversible processes. But 
why are they so important? Or, why do the basic laws we 
write have that property? Can't we just assume some- 
thing else? I don't think so. First of all, we want to 
write laws that tell us how different events in the past 
have caused the present and how the present will cause 
events in the future. If I have this I will have that. Cause 
and effect already implies determinism. But the need for 
these types of processes is really linked to the way we 
label states to begin with. And this is where I go back 
and talk a bit about the problems I disregarded. 

Suppose we have a set of states, and we label them 
with a property that has a different value for each state. 
Position, momentum, color, shape... Do not care what it 
is. Suppose I give you a way to determine that at time 
t that particular property had a particular value. Sup- 
pose, though, that I don't give you a process to properly 
prepare it. In fact, no process in the whole universe can 
result in a state where that property is set to a precisely 
determined value. It's always prepared randomly. Now 
you have a problem: you measure that value, but that 
value is always the output of a random process. So your 
measurement is indistinguishable from a random gener- 
ator. How can you be sure you are actually measuring 
something? So, for your measurement to make sense, you 
need a process that deterministically prepares the values. 

Conversely: suppose I give you a way to prepare the 
value, but I don't give you a process to measure it. In 
fact, no process in the whole universe could tell that the 
initial state had a particular value. You can never recon- 
struct it. Now you have the opposite problem: you have 
a value, but nothing depends on it. How can you tell you 
actually prepared something? So, not only do you need 
a way to deterministically prepare the value, you need a 
way to tell it was there. 

For our definition of state to be physically meaning- 
ful, the label needs to be part of a chain of at least one 
deterministic and reversible process. In principle, things 
could still exist that are not part of any deterministic 
chain. But they can't be the argument of physics: they 



are not states and you can't write equations of motions.® 

But wait a minute: what about random variables? We 
do have statistical processes: we study those all the time 
and we write equations for them! Doesn't that contra- 
dict all I just said, thus proving it nonsense? No. For 
statistical processes, the label is a statistical property. 
Like the average or the standard deviation. You need a 
way to prepare a state with a given average, you need a 
way to measure the standard deviation. All that I said 
before, applies to those quantities as those are the labels. 
It does not apply to the elements of the distribution be- 
cause those are not the labels. 

So we have pinned down this paradox: if everything 
can be known, everything can be controlled, everything 
must be isolated; but if everything is isolated, it cannot 
be controlled or known. This opens up the way to quan- 
tum mechanics, which does solve partly this problem. 
Each quantum state has a part that can be prepared, 
known and labelled, and a part that cannot. 



VIII. POSSIBLE GENERALIZATION TO 
QUANTUM MECHANICS 



My hope is that this work can also be generalized 
to quantum mechanics with minor conceptual modifica- 
tions. I have bits and pieces, and hopefully an overall 
picture, but in this business until you have most of the 
details you got nothing. My feeling is that the first part 
of the state space definition is the same, applying labels 
to states; while for the second part we will assume the 
system to be irreducible (i.e. we cannot trace the evolu- 
tion of parts of a particle). This means we would need 
some amount of extra information (that we will never 
have) to fully define each part. Each state, then, should 
have some entropy associated with it, which means we 
cannot reduce the spread in position and momentum as 
we please. And you can see already familiar themes. My 
hope is that multiple solid arguments will require the 
state space to be a complex vector space. After that, 
many elements are easy to get: unitary evolution is very 
close to having the Schrodinger equation; hermitian op- 
erators for observables and the projection postulate are 
straightforward to derive from the scalar product. While 
I still have a lot of work ahead, the hope that the final 
derviation will be essentially similar for both classical and 
quantum is very appealing to me. 



^ There is a slight difference in that the ehain could be of states 
of different systems, while the assumption we used was that the 
system itself is deterministic and reversible. This would be an 
interesting aspect to explore further as it necessarily would lead 
to different equations. 
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IX. CONCLUSION 

While there are clearly some details to polish, I think 
the broad picture is firm. It seems to fit like a jigsaw 
puzzle between concepts that appeared, at least to me. 



completely disconnected. None of the concepts are new 
per se, just their connection. And it is that connection 
that I think gives a deeper understanding. 

I hope you enjoyed this work, and that it gave you new 
insights and ideas to chew on. 
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